2  Getting set up with R and Rstudio

2.1 Learning Objectives

After completing this activity you should

  • be able to download and install R and Rstudio on your laptop
  • be able to install Rtools & devtools to be able to compile R packages from source (Windows).
  • understand the main use for each of the four main panes in the Rstudio GUI.
  • understand what a package is in R and how to install them.

2.2 Install & Set up R and Rstudio on your computer

If you have already installed R and Rstudio make sure your R version is up to date. Whenever you open Rstudio the version will be printed in the console (bottom left pane). In addition, you can always check what version is installed by typing sessionInfo() into your console. You should be using version 4.0.0 or later. You do not need to uninstall old version of R. If you do have to update, you will need to re-install packages (see below) for R4.0.0

2.2.1 Windows

Install R

  • Download most recent version of R for Windows here.
  • Run the .exe file that was downloaded and follow instructions in the set-up wizard.

Install Rtools

  • Download Rtools here.
  • Run the downloaded .exe file that was download and follow the instructions in the set-up wizard.

Install Rstudio

  • Go to Rstudio download page.
  • Scroll down to select the Rstudio current version for Windows XP/Vista/7/8/10.
  • Run the .exe file that was downloaded and follow instructions in the set-up wizard.

Finish setting up Rtools

  • Open Rstudio to make sure you aren’t getting any error messages.
  • Put Rtools in your path by typing writeLines('PATH="${RTOOLS40_HOME}\\usr\\bin;${PATH}"', con = "~/.Renviron") in the console window.
  • Install the devtools package by typing install.packages("devtools") in the console.

Install quarto

Download quarto using this link. Pick the file according to your operating system Run the downloaded .exe file that was download and follow the instructions in the set-up wizard.

2.2.2 Mac OS X

Download & install R

  • Go to (CRAN)[http://cran.r-project.org/], select Download R for (Mac) OS X.
  • Download the .pkg file for your OS X version.
  • Run the downloaded file to install R.

Download & install XQuartz (needed to run some R packages)

  • Download XQuartz
  • Run the downloaded file to install

Download & install Rstudio

  • Go to Rstudio download page.
  • Scroll down to select the Rstudio current version for Mac OS X.
  • Run the .exe file that was downloaded and follow instructions in the set-up wizard.

Install quarto

Download quarto using this link. Pick the file according to your operating system Run the downloaded .exe file that was download and follow the instructions in the set-up wizard.

2.3 Get to know Rstudio

Rstudio is an Integrated Development Environment (IDE) that you can use to write code, navigate files, inspect objects, etc. The advantage of using an IDE is that you have access to shortcuts, visual cues, troubleshooting, navigation, and autocomplete help.

2.3.1 GUI Layout

GUI stands for graphic user interface and refers to a type of user interface that allows users to interact with software applications and electronic devices through visual elements such as icons, buttons, windows, and menus, rather than using text-based command-line interfaces.

You have probably mostly interacted with computer programs through a GUI, where you interact with the system by manipulating graphical elements using a pointing device like a mouse, touch screen, or stylus. GUIs provide a more intuitive and user-friendly way for individuals to interact with computers and software because you can “see” what the effect of what you are doing is having. Graphical User Interfaces are a major departure from earlier text-based interfaces like command-line interfaces. They have contributed significantly to the widespread adoption of computers and software by making them more accessible to a broader range of users. GUIs are used in various types of software, from operating systems to applications like web browsers, image editors, word processors, and more.

Not too long ago, if you had wanted to learn R or another programming language you would have been working directly on a console instead of an IDE like Rstudio which has made coding a lot more accessible to beginners because you can more easily use scripts, interactively run code and visualize data.

Open Rstudio and identify the four panes in the interface (default layout).

  1. Editor (top left): edit scripts/other documents, code can be sent directly to the console.
  2. R console (bottom left): Run code either by directly typing the code or sending it from the editor pane.
  3. Environment/history (top right): Contains variables/objects as you create them & full history of functions/commands that have been run.
  4. files/plots/packages/help/viewer (bottom right): Different tabs in this pane wil let you explore files on your computer, view plots, loaded packages, and read manual pages for various functions.

The panes can be customized (Rstudio -> Preferences -> Pane Layout) and you can move/re-size them using your mouse.

Note

We are going to switch to have the Console in our top right and the Environment in the bottom left which makes it easier to see your code output and your script/quarto document at the same time.

2.3.2 Interacting with R in Rstudio

Think of R as a language that allows you to give your computer precise instructions (code) to follow.

  • Commands are the instructions we are giving the computer, usually as a series of functions.
  • Executing code or a program means you are telling the computer to run it.

There are three main ways to interact with R - directly using console, script files (*.R), or code chunks embedded in R markdown (*.Rmd) or quarto files (*.qmd). We will generally be working with the later.

The console is where you execute code and see the results of those commands. You can type your code directly into the console and hit Enter to execute it. You can review those commands in the history pane (or by saving the history) but if you close the session and don’t save the history to file those commands will be forgotten.

By contrast, writing your code in the script editor either as a standard script or as a code chunk in an quarto document allows you to have a reproducible workflow (future you and other collaborators will thank you).

Executing an entire script, a code chunk, or individual functions from a script will run them in the console.

  • Ctrl + Enter will execute commands directly from the script editor. You can use this to run the line of code your cursor is currently in in the script editor or you can highlight a series of lines to execute.
  • If you are using a quarto file you can execute an entire code chunk by pressing the green arrow in the top right corner.

If the console is ready for you to execute commands you should see a > prompt. If you e.g. forget a ) you will see a + prompt - R is telling you that it is expecting further code. When this happens and you don’t know what you are missing (usually it is an unmatched quotation or parenthesis), make sure your cursor is in the console and hit the Esc key.

We will run through these options, but you can always check back here while you are getting used to R.

2.3.3 Customize Rstudio

There are several options to customize Rstudio including setting a theme, and other formatting preferences. You can access this using Tools > Global Options. I recommend using a dark theme (it’s a lot easier on the eyes) and keeping the panes in the same positions outlined above because it will make troubleshooting a lot easier1.

  • 1 “You should see xx in the top left” is a lot more helpful if your top left looks like my top left!

  • 2.4 Installing and using packages in R

    2.4.1 Install a package

    Think of R packages or libraries as tool kit comprising a set of functions (tools) to perform specific tasks. R comes with a set of packages already installed that gives you base R functions; you can view these and determine which have been loaded in the Packages tab in the bottom right pane. For other tasks we will need additional packages. 2

  • 2 Most R packages are found in the CRAN repository and on Bioconducter, developmental packages are available on github.

  • A central group of packages for data wrangling and processing form the tidyverse, described as “… an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures.” - We are going to heavily rely on core functions from the tidyverse to wrangle, summarize, and analyze data.

    When you install packages they will be downloaded and installed onto your computer. Determine what your default path is using .libPaths() and change if necessary.

    The easiest way to install packages directly in the console is to use the install.packages() function.

    Use the R console to install some libraries to get us started (we will install other libraries as needed for other labs).

    Using # in an R script allows you to insert comments that are ignored by R when executing your code. Use comments to document your code, future you will thank you! Before submitting any of your skills tests or homework assignments you should always go through and make sure each piece of code has a descriptive comment. You do not need to add a comment for multi-line code that you are stringing together using a pipe %>% but you should have one descriptive comment above the set of commands you are giving R and then make sure that you add any comments that you need to remember how the function works or which parameters might be useful to tweak/set differently if you were to reuse that code.

    # install central packages in the tidyverse
    install.packages("tidyverse")
    
    # install additional packages
    install.packages("plyr", "ggthemes", "patchwork", "glue")

    Let’s check if you were able to successfully install those packages by ensureing you can load them. Any time you start a new R session (e.g. by closing Rstudio and restarting it), you will need to load your libraries beyond the base libraries that are automatically loaded using the library() function in order to be able to use the functions specific to that package3.

  • 3 Troubleshooting tip: if you get an error along the lines of function() cannot be found the first thing you will want to do is check if your libraries are loaded!

  • # load library
    library(tidyverse)
    Warning: package 'tidyverse' was built under R version 4.3.1
    ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
    ✔ dplyr     1.1.2     ✔ readr     2.1.4
    ✔ forcats   1.0.0     ✔ stringr   1.5.0
    ✔ ggplot2   3.4.2     ✔ tibble    3.2.1
    ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
    ✔ purrr     1.0.1     
    ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
    ✖ dplyr::filter() masks stats::filter()
    ✖ dplyr::lag()    masks stats::lag()
    ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

    If you don’t see any error messages in the console along the lines of there is no package called ... you are all set. If you look in the packages tab in the lower right panel you should also see that packages such as dplyr and tidyr (two of the central tidyverse packages) now have a little check box next to them.

    2.4.2 Updating R packages

    You should generally make sure to keep your R packages up to date as new versions include important bugfixes and additional improvements. The easiest way to update packages is to use the Update button in the Packages tab in the bottom right panel. Over the course of the semester you should not have to do this, but when you install new packages you might get message that some of your packages need to be updated which you can then either choose to do at that point or ignore.

    Warning

    Be aware that updating packages might break some code you have previously written. For most of what we will be doing this should not be the case. If you used R for a previous course, make sure to update you packages at the beginning of this course and we should be set for the semester.